Method and system for using a standby server to improve redundancy in a dual-node data storage system

ABSTRACT

Methods are provided in which a standby server, a first main server, and a second main server to control shared input/output (I/O) adapters in a storage system are provided. The standby server is in communication with the first main server and the second main server, and the storage system is configured to operate as a dual node active system. The methods include activating the standby server in response to receiving a communication from the first main server of a fail mode of the second main server. Systems and physical computer storage media are also provided.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates in general to computers, and moreparticularly to method, system, and computer program product embodimentsfor improving reliability in a computer storage environment.

2. Description of the Related Art

Storage area networks, or SANs, consist of multiple storage devicesconnected by one or more fabrics. Storage devices can be of two types:host systems that access data and storage subsystems that are providersof data. In a large distributed computer system, a plurality of hostsystems are typically connected to a number of direct access storagedevices (DASDs) making up the storage subsystems. A storage controllercontrols read and write operations between host computers of the hostsystems and the DASDs. The DASDs are comprised of hard disk drives(HDDs) and may be organized in a redundant array of independent disks,i.e., a RAID array. A RAID array is comprised of multiple, independentdisks organized into a large, high-performance logical disk. Acontroller stripes data across the multiple disks in the array andaccesses the disks in parallel to achieve higher data transfer rates.

To reduce the risk of system failure due to failure of a hard disk drivein a DASD system such as a RAID array, redundancy in the form oferror-correcting codes to tolerate disk failures is typically employed.Further, to reduce a risk of failure at a point within the storagecontroller, the storage controller is typically designed to handlehardware failure. For example, the storage controller can have twostorage clusters, each of which provides for selective connectionbetween a host computer and a DASD. Each cluster has a cache and a nonvolatile storage unit (NVS). The cache buffers frequently used data.When a request is made to write data to a DASD attached to the storagecontroller, the storage controller may cache the data and delay writingthe data to a DASD. Caching data can save time as writing operationsinvolve time consuming mechanical operations. The cache and NVS in eachcluster can intercommunicate, allowing for recovery and reconfigurationof the storage controller in the event that one of the memory elementsis rendered unavailable. For instance, if one cluster and its cachefail, the NVS in the other cluster maintains a back-up of the cache inthe failed cluster.

Other storage controllers include multiple storage clusters or have an“n-way” architecture. In such configurations, if one cluster and itscache fail, the NVS in the other clusters maintains a back-up of thecache in the failed cluster.

SUMMARY OF THE INVENTION

From time to time, maintenance and/or upgrade functions are performed onthe storage system. During these operations, dual-cluster storagecontrollers may failover to a single node configuration wherein thesystem runs on a single node configuration. As a result, the systembecomes less fault tolerant than the normal dual node configuration.Though use of multiple storage clusters improves fault tolerance byallowing other clusters to continue to operate despite failover of onecluster, multiple storage cluster configurations are more complex thandual node configurations. Consequently, challenges associated withdesigning such configurations include costs, efficiency, andhardware/software. Accordingly, there is a need in the art for improvedmethods, systems, and programs for improving redundancy within thestorage system when one server is in a fail mode.

Various embodiments of methods of storing data are provided that improveredundancy during a fail mode of a main server. In one embodiment, amethod includes providing a standby server, a first main server, and asecond main server to control a shared input/output (I/O) adapters in astorage system, where the standby server is in communication with thefirst main server and the second main server, and the storage system isconfigured to operate as a dual node system, and activating the standbyserver in response to receiving a communication from the first mainserver of a fail mode of the second main server

Also provided are improved systems for storing data. One systemcomprises a first main server, a second main server in communicationwith the first main server, a standby server in communication with thefirst main server and the second main server, and shared input/outputadapters in communication with the standby server, the first mainserver, and the second main server. The storage system is configured tooperate as a dual node system, and the standby server is adapted toactivate in response to receiving a communication from the first mainserver of a fail mode of the second main server.

Physical computer storage mediums (e.g., an electrical connection havingone or more wires, a portable computer diskette, a hard disk, a randomaccess memory (RAM), a read-only memory (ROM), an erasable programmableread-only memory (EPROM or Flash memory), an optical fiber, a portablecompact disc read-only memory (CD-ROM), an optical storage device, amagnetic storage device, or any suitable combination of the foregoing)comprising a computer program product method for controlling a storagesystem comprising a first main server, a second main server incommunication with the first main server, and a standby server incommunication with the first main server and the second main server,wherein the storage system is configured to operate as a dual nodesystem with shared input/output adapters are also provided. One physicalcomputer storage medium comprises computer code for activating thestandby server in response to receiving a communication from the firstmain server of a fail mode of the second main server.

BRIEF DESCRIPTION OF THE DRAWINGS

In order that the advantages of the invention will be readilyunderstood, a more particular description of the invention brieflydescribed above will be rendered by reference to specific embodimentsthat are illustrated in the appended drawings. Understanding that thesedrawings depict only typical embodiments of the invention and are nottherefore to be considered to be limiting of its scope, the inventionwill be described and explained with additional specificity and detailthrough the use of the accompanying drawings, in which:

FIG. 1 is a block diagram of a dual node storage control system withwhich the present invention may be practiced;

FIG. 2 is a flow chart diagram of an exemplary method for storing datain a computer storage environment; and

FIG. 3 is a flow chart diagram of another exemplary method for storingdata in a computer storage environment.

DETAILED DESCRIPTION OF THE DRAWINGS

The illustrated embodiments below provide systems and methods forstoring data in a computer storage environment. Also provided arephysical computer storage mediums for controlling a storage systemcomprising computer code activating a standby server in response toreceiving a communication from a first main server of a fail mode of asecond main server.

FIG. 1 is a block diagram of the components and architecture of apreferred embodiment of a storage system 2. The storage system 2 isconfigured to operate a dual node system. As used herein, the term “dualnode system” is defined as a system that continuously operates with twooperational servers, such that multiple host and disk adapters coupledto the servers are continuously provided with two paths for data flow.In this regard, the storage control system includes a first main server4 and a second main server 6 that communicate with each other to therebysend and receive communications between each other. The first and secondmain servers 4, 6 each comprise a processor 8, 14 cache 10, 16 andnon-volatile storage 12, 18. Each component of the servers 4, 6 (e.g.,processor 8, 14, cache 10, 16, and non-volatile storage 12, 18) areconfigured in a substantially similar manner. It will be appreciatedthat main server 4 and main server 6 are referred to as “first” and“second”, respectively, for simplicity and ease of understanding. Thus,main server 4 can be referred to as the second main server and mainserver 6 can be referred to as the first main server, in otherembodiments.

To provide redundancy and reduce system performance impact during aperiod of time in which one of the main servers 4, 6 is not operational,a standby server 20 is also provided. The standby server 20 is incommunication with the first main server 4. Alternatively, the standbyserver 20 can additionally communicate with the second main server 6.Generally, the standby server 20 comprises processor 22, cache 24, andnon-volatile storage 26 that generally are configured in a substantiallysimilar manner to those included as part of the first and second mainservers 6. In particular, the standby server 20 is configured to controlthe system 2 in an event in which one of the first or second mainservers 4, 6 become non-operational. Additionally, the standby server 8is further configured to assume an identity of the non-operational mainserver 4, 6 so that the system allows continuous operation of sharedinput/output (I/O) adapters 30, 32, 34, 36. Moreover, the standby server8 is adapted to relinquish control of the system 2, when thenon-operational main server 4, 6 is repaired.

The main servers 4, 6 and standby server 20 communicate over connections40 that enable processor inter-communication to manage configuringoperations performed with respect to the shared devices, such as theshared I/O adaptors 30, 32, 34, 36. In alternative embodiments, theremay be only one fabric connecting all adaptors 30, 32, 34, 36.Alternatively, more than one fabric may be employed for communication.

In addition to communicating with each other, the servers 4, 6, 20communicate with input/output devices such as shared I/O adapters 30,32, 34, 36, DASD 46, and an external host or switched fabric 50. Asillustrated, at least one path exist between each servers 4, 6, 20 andeach adapter 30, 32, 34, 36, and the shared I/O adapters 30, 32, 34, 36actively communicate with two of the servers 4, 6, 20 (which depends onwhich servers are active) to write both cache and NVS copies of data. Inan embodiment, four shared I/O adapters 30, 32, 34, 36 are included. Inother embodiments, more or fewer shared I/O adapters are employed. DASD46, which includes multiple RAID arrays, is a magnetic storage unit suchas a hard disk drive, disks, tapes, terminals, LANs (Local AreaNetworks), printers or other input/output devices or input/outputsubsystems. Although a single DASD 46 is illustrated, more can beincluded. For example, one or more shared I/O adapters 30, 32, 34, 36can be coupled to one or more DASDs. The external host or switchedfabric 50 can be a single server or multiple servers or mainframesconnected either directly to the storage system 2 or indirectly throughnetwork switches. They can have different data formats (e.g. fixed-blockor CKD) and use different hardware mediums and software protocols forconnection (like fibre-channel or iSCSI).

The main servers 4, 6 and standby server 8 and input/output devices areconnected by a bay switch 44. The bay switch 44 includes shared I/Oresources and can comprise a dual master bus, which may be controlled byone of the first or second main server 4, 6 or standby server 20. Inother embodiments, the bay switch 44 may include technology to allow thebus to operate at its own clock speed and provide a buffer to bufferdata transferred across the bay switch 44.

As noted above, the inclusion of the standby server 20 with the mainservers 4, 6 provides the system 2 with redundancy so that in an eventin which one of the main servers 4, 6 fails, the standby server 20 canstep in to allow the system 2 to continue to operate as a dual nodestorage system. Various events may cause one of the main servers 4, 6 tobecome non-operational. For example, some events may be scheduled, suchas during maintenance, upgrades or repairs. In such case, the system 2may be provided with a protocol that coordinates hand-off of sharedresources from one server to another server.

FIG. 2 is a flow chart diagram of a method 200 for storing data in acomputer storage environment, according to an embodiment. Beforeoperation of one of the main servers is interrupted, the main servers(e.g., first and second main servers 4, 6) operate in an active mode,and the standby server (e.g., standby server 20) operates in an inactivemode, step 202. In particular, the first and second main servers operatein a dual node configuration to manage and perform input/outputoperations communicated from shared I/O adapters (e.g., shared I/Oadapters 30, 32, 34 36). The input/output operations are temporarilystored in memory storage areas of the active main servers. For example,each main server contains a copy of its own data stored in its cache anda copy of modified cache data of the other main server in its NVS. Thus,for example, the first main server 4 would include a copy of its data incache 12 and a copy of modified cache data of the second main server 6in NVS 16, while the second main server 6 would include a copy of itsdata in cache 14 and a copy of modified cache data of the first mainserver 4 in NVS 18.

When one of the main servers, for example, the second main server 6, istaken down and becomes non-operational, this non-operational main serverenters a “failback to service” condition. In the failback to servicecondition, a protocol is provided to the non-operational main server tovoluntarily relinquish control of shared resources, and instructions areprovided to the operational main server to direct the standby server toremain active. The non-operational main server may engage a failback toservice condition during a code update or other MES scenario.

During the failback to service condition, the modified data stored onNVS of the operational main server is destaged through the bay switch 44and the shared I/O adapters 30, 32, 34, 36 to other storage areas (e.g.a DASD) to the preserve the integrity of the data. In this way, thesystem 2 retains the modified data associated with the non-operationalmain server despite server shut down. Additionally, control of theresources previously shared between the two main servers is transferredto the operational server.

After failback to service on the non-operational main server iscomplete, the operational main server (e.g., first main server 4)provides a communication to the standby server indicating a fail mode ofthe non-operational main server (e.g., failback to service condition ofthe second main server 6), step 204. In response to receiving thecommunication, the standby server activates, step 206. The standbyserver enters a “failback to dual” condition to gain control of all ofthe operations performed by the non-operational main server and of theportion of the shared resources previously controlled by thenon-operational main server to provide full redundancy for the system.Specifically, the standby server assumes the identity of thenon-operational main server to provide continued operation without resetor reconfiguration of the shared input/output adapters of the storagesystem during the fail mode of the second main server. In particular, byemploying shared I/O adapters, the hosts are not aware of any changebetween the main servers and the standby server in an event of the failmode of the second main server, because paths between the hosts and thestorage subsystem are not reconfigured or switched.

After the non-operational main server is updated and rebooted, thenon-operational main server sends a signal to the operational mainserver that it is ready to rejoin the system, step 208.

Next, the standby server is deactivated in response to receiving acommunication from the operational main server, step 210. In thisregard, the operational main server enters a failback to servicecondition to regain control of the resources being handled by thestandby server. In particular, the standby server relinquishes controlof the resources and reboots, and control of the resources istransferred to the operational main server. After the failback toservice is complete, the operational main server (e.g., the first mainserver 4) sends a request to the other main server (e.g., second mainserver 6) to rejoin the system and control of the shared resources aretransferred from the operational main server to the other main server,step 212.

In some cases, upgrade, maintenance or scheduled repair is to beperformed on the operational main server (referred to above as the firstmain server 4). In these instances, steps 202 through step 212 can berepeated. However, the first main server 4 performs the operationsdescribed in the method 200 associated with the non-operational mainserver and the second main server 6 performs the operations described inthe method 200 associated with the operational main server.

In another case, operation of one of the main servers may be interruptedunexpectedly. FIG. 3 is a flow chart diagram of a method 200 for storingdata in a computer storage environment, according to another embodiment.Before operation of one of the main servers is interrupted, both (e.g.,first and second main servers 4, 6) operate in an active mode, and thestandby server (e.g., standby server 20) operates in an inactive mode,step 302. In particular, the first and second main servers operate in adual node configuration to manage and perform input/output operationscommunicated from the shared I/O adapters (e.g., shared I/O adapters 30,32, 34 36). The input/output operations are temporarily stored in memorystorages areas of the first and second main servers. For example, eachmain server contains a copy of its own data stored in its cache and acopy of modified cache data of the other main server in its NVS. Thus,for example, the first main server 4 would include a copy of its data incache 12 and a copy of modified cache data of the second main server 6in NVS 16, while the second main server 6 would include a copy of itsdata in cache 14 and a copy of modified cache data of the first mainserver 4 in NVS 18.

When one of the main servers, for example, the second main server 6,unexpectedly becomes non-operational, the system enters a “failover”condition during which an operational main server takes control ofshared resources from the non-operational main server without permissionfrom the non-operational main server. During the failover condition, thedata stored on the NVS of the operational main server (e.g., themodified cache data) is destaged to the DASD, in an embodiment. In thisway, the modified cache data is committed and the redundant data on thenon-operational main server can be lost with minimal impact.Additionally, control of the resources previously shared between the twomain servers is transferred to the operational server.

After failover to the operational main server is complete, theoperational main server (e.g., first main server 4) provides acommunication to the standby server indicating a fail mode of thenon-operational main server (e.g., failover condition of the system 2),step 304. In response to receiving the communication, the standby serveractivates, step 306. The standby server gains control of all of theoperations performed by the non-operational main server and of theportion of the shared resources previously controlled by thenon-operational main server to provide full redundancy for the system.Specifically, the standby server assumes the identity of thenon-operational main server to provide continued operation without resetor reconfiguration of the shared input/output adapters of the storagesystem during the fail mode of the second main server. After thenon-operational main server is repaired, it sends a signal to theoperational main server that it is ready to rejoin the storage system,step 308.

Next, the standby server is deactivated in response to receiving acommunication from the operational main server, step 310. In thisregard, the operational main server enters a failback to servicecondition to regain control of the resources being handled by thestandby server. In particular, the standby server relinquishes controlof the resources and reboots, and control of the resources istransferred to the operational main server. After the failback toservice is complete, the operational main server (e.g., the first mainserver 4) sends a request to the other main server (e.g., second mainserver 6) to rejoin the storage system, step 312.

By including the standby server and by configuring the standby server tobe capable of assuming the identity of a main server, the storage systemcontinuously operates in the dual node configuration despite an event inwhich one of the main servers experiences a fail mode. As a result, thestorage system has improved redundancy during code loads, MES upgrades,server level repair actions and/or failure scenarios. Moreover, systemperformance is maintained during such circumstances.

As will be appreciated by one of ordinary skill in the art, aspects ofthe present invention may be embodied as a system, method, or computerprogram product. Accordingly, aspects of the present invention may takethe form of an entirely hardware embodiment, an entirely softwareembodiment (including firmware, resident software, micro-code, etc.) oran embodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module,” or “system.”Furthermore, aspects of the present invention may take the form of acomputer program product embodied in one or more computer-readablemedium(s) having computer readable program code embodied thereon.

Any combination of one or more computer-readable medium(s) may beutilized. The computer-readable medium may be a computer-readable signalmedium or a physical computer-readable storage medium. A physicalcomputer readable storage medium may be, for example, but not limitedto, an electronic, magnetic, optical, crystal, polymer, electromagnetic,infrared, or semiconductor system, apparatus, or device, or any suitablecombination of the foregoing. Examples of a physical computer-readablestorage medium include, but are not limited to, an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk,RAM, ROM, an EPROM, a Flash memory, an optical fiber, a CD-ROM, anoptical storage device, a magnetic storage device, or any suitablecombination of the foregoing. In the context of this document, acomputer-readable storage medium may be any tangible medium that cancontain, or store a program or data for use by or in connection with aninstruction execution system, apparatus, or device.

Computer code embodied on a computer-readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wired, optical fiber cable, radio frequency (RF), etc., or any suitablecombination of the foregoing. Computer code for carrying out operationsfor aspects of the present invention may be written in any staticlanguage, such as the “C” programming language or other similarprogramming language. The computer code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer or entirely on the remote computer or server. In the latterscenario, the remote computer may be connected to the user's computerthrough any type of network, or communication system, including, but notlimited to, a local area network (LAN) or a wide area network (WAN),Converged Network, or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).

Aspects of the present invention are described above with reference toflow diagrams and/or block diagrams of methods, apparatus (systems) andcomputer program products according to embodiments of the invention. Itwill be understood that each block of the flow diagrams and/or blockdiagrams, and combinations of blocks in the flow diagrams and/or blockdiagrams, can be implemented by computer program instructions. Thesecomputer program instructions may be provided to a processor of ageneral purpose computer, special purpose computer, or otherprogrammable data processing apparatus to produce a machine, such thatthe instructions, which execute via the processor of the computer orother programmable data processing apparatus, create means forimplementing the functions/acts specified in the flow diagram and/orblock diagram block or blocks.

These computer program instructions may also be stored in acomputer-readable medium that can direct a computer, other programmabledata processing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer-readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flow diagram and/or blockdiagram block or blocks. The computer program instructions may also beloaded onto a computer, other programmable data processing apparatus, orother devices to cause a series of operational steps to be performed onthe computer, other programmable apparatus or other devices to produce acomputer implemented process such that the instructions which execute onthe computer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flow diagram and/orblock diagram block or blocks.

The flow diagrams and block diagrams in the above figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflow diagrams or block diagrams may represent a module, segment, orportion of code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flow diagrams, andcombinations of blocks in the block diagrams and/or flow diagram, can beimplemented by special purpose hardware-based systems that perform thespecified functions or acts, or combinations of special purpose hardwareand computer instructions.

1. A method of storing data, the method comprising: providing a standbyserver, a first main server, and a second main server to control sharedinput/output (I/O) adapters in a storage system, the standby server incommunication with the first main server and the second main server, andthe storage system configured to operate as a dual node active system;and activating the standby server in response to receiving acommunication from the first main server of a fail mode of the secondmain server.
 2. The method of claim 1, wherein the standby serverassumes an identity of the second main server for continued operationwithout reset or reconfiguration of the shared input/output adapters ofthe storage system during the fail mode of the second main server. 3.The method of claim 1, further comprising: deactivating the standbyserver in response to receiving a communication from the first mainserver.
 4. The method of claim 1, wherein: the fail mode comprises aservice mode; and the method further comprises: destaging data from thesecond main server during the fail mode, and updating and rebooting thesecond main server during the fail mode.
 5. The method of claim 4,further comprising deactivating the standby server in response toreceiving a communication from the first main server, after the step ofupdating and rebooting.
 6. The method of claim 1, wherein: the fail modecomprises a failover mode; and the method further comprises: destagingmodified data from the first main server during the fail mode of thesecond server, and performing the step of activating the standby server,after the step of destaging data.
 7. The method of claim 6, furthercomprising deactivating the standby server in response to receiving acommunication from the first main server.
 8. A system for storing data,the system comprising: a first main server, a second main server incommunication with the first main server; a standby server incommunication with the first main server and the second main server; andshared input/output adapters in communication with the standby server,the first main server, and the second main server, wherein: the storagesystem is configured to operate as a dual node system, and the standbyserver is adapted to activate in response to receiving a communicationfrom the first main server of a fail mode of the second main server. 9.The system of claim 3, further comprising an input/output bay switchcoupled to the standby server, the first main server, and the secondmain server, wherein the shared input/output adapters are coupled to theinput/output bay switch.
 10. The system of claim 9, wherein the standbyserver is further configured to assume an identity of the second mainserver for continued operation of the shared input/output adapters ofthe storage system during the fail mode of the second main server. 11.The system of claim 10, wherein the standby server is further configuredto assume the identity of the second main server without host disruptionor reconfiguration during the fail mode of the second main server. 12.The system of claim 8, wherein the standby server is further configuredto deactivate in response to receiving a communication from the firstmain server.
 13. A physical computer storage medium comprising acomputer program product method for controlling a storage systemcomprising a first main server, a second main server in communicationwith the first main server, and a standby server in communication withthe first main server and the second main server, wherein the storagesystem is configured to operate as a dual node system with sharedinput/output adapters, the physical computer storage medium comprising:computer code for activating the standby server in response to receivinga communication from the first main server of a fail mode of the secondmain server.
 14. The physical computer storage medium of claim 13,further comprising computer code for commanding the standby server toassume an identity of the second main server for continued operation ofinput/output adapters of the storage system during the fail mode of thesecond main server.
 15. The physical computer storage medium of claim14, wherein the computer code for commanding the standby server includescomputer code for operating the standby server without reconfigurationof the shared input/output adapters.
 16. The physical computer storagemedium of claim 13, further comprising computer code for deactivatingthe standby server in response to receiving a communication from thefirst main server.
 17. The physical computer storage medium of claim 13,wherein: the fail mode comprises a service mode, and the physicalcomputer storage medium further comprises computer code for updating andrebooting the second main server during the fail mode.
 18. The physicalcomputer storage medium of claim 17, further comprising computer codefor deactivating the standby server in response to receiving acommunication from the first main server.
 19. The physical computerstorage medium of claim 13, wherein: the fail mode comprises a failovermode; and the physical computer storage medium further comprises:computer code for destaging modified data from the first main serverduring the fail mode of the second main server, and computer code forperforming the step of activating the standby server, after destagingdata.
 20. The physical computer storage medium of claim 19, furthercomprising computer code for deactivating the standby server in responseto receiving a communication from the second main server.